a and b c
d and e
lease squares estimates
body = read.csv("Bodyfat.csv")
xmat = matrix(0, nrow(body), 5)
xmat[,1] = rep(1, nrow(body))
xmat[,2] = body$Age
xmat[,3] = body$Weight
xmat[,4] = body$Height
xmat[,5] = body$Age + 10*body$Weight + 3*body$Height
lmod1 = lm(bodyfat ~ Age + Weight + Height + I(Age + 10*Weight + 3*Height), data = body)
summary(lmod1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.7673848 7.47935066 2.375525 1.828479e-02
## Age 0.1697902 0.02956033 5.743853 2.698997e-08
## Weight 0.1981519 0.01312664 15.095402 5.829476e-37
## Height -0.5943339 0.10690038 -5.559698 6.972685e-08
using lm function to get least squares esitmates BODYFAT \(= 17.7673848 + 0.1697902Age + 0.1981519Weight -0.5943339Height\)
svd
pseudo_inverse = pinv(xmat)
# these are least square estimates from beta0 to beta4
esti = pseudo_inverse %*% body$bodyfat;esti
## [,1]
## [1,] 17.767384801
## [2,] 0.166472102
## [3,] 0.164971028
## [4,] -0.604288101
## [5,] 0.003318082
#we notice that the matrix X's rank is 4, because the last column is the linear combination of the previous columns.
Rank(xmat)
## [1] 4
#find the vector that spans the nullspace of X
x5 = nullspace(xmat)
#Then this vector is orthogonal to row space of X
#A new least square estimate could be
esti + x5
## [,1]
## [1,] 17.76738480
## [2,] 0.07155630
## [3,] -0.78418697
## [4,] -0.88903550
## [5,] 0.09823388
\[\text{x5 is the vector that spans nullspace of X and is orthogonal to row space of X }\\ \beta_1 =\lambda^T\beta= (0,1,0,0,0)\beta \] \[ so\ \lambda= \begin{pmatrix} 0\\ 1\\ 0\\ 0\\ 0 \end{pmatrix} and\ \ \lambda \cdot x5 = -0.0949158,\text{ which is not equal to 0} \] \[\text{This means that}\ \lambda\ \text{is not in the row space of X, so }\beta_1 \text{ is not estimable} \]
\[\text{we assume the model to be }\\BodyFat = \beta_0 +\beta_1 Age +\beta_2Weight+\beta_3Height+\beta_4(Age+10*Weight+3*Height)\\ \text{but we can rewrite it as }\\ BodyFat = \beta_0 +(\beta_1+\beta_4) Age + (\beta_2+10\beta_4)Weight+(\beta_3+3\beta_4)Height\\ \text{From part a, we know these results }\\ \text{So least sqaures estimates }\\\beta_0 = 17.7673848\\ \beta_1+\beta_4 = 0.1697902 \\\beta_2+10\beta_4 = 0.1981519 \\ \beta_3+3\beta_4 = -0.5943339 \]
Yes, we can just read off estimates from part c
summary(lm(bodyfat ~ Age + Weight + Height + I(Age + 10*Weight +
3*Height), data = body))
##
## Call:
## lm(formula = bodyfat ~ Age + Weight + Height + I(Age + 10 * Weight +
## 3 * Height), data = body)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.3960 -4.5038 -0.0326 3.8324 15.7154
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.76738 7.47935 2.376 0.0183 *
## Age 0.16979 0.02956 5.744 2.70e-08 ***
## Weight 0.19815 0.01313 15.095 < 2e-16 ***
## Height -0.59433 0.10690 -5.560 6.97e-08 ***
## I(Age + 10 * Weight + 3 * Height) NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.809 on 248 degrees of freedom
## Multiple R-squared: 0.524, Adjusted R-squared: 0.5182
## F-statistic: 90.99 on 3 and 248 DF, p-value: < 2.2e-16
We found that if a column is a linear combination of the other columns, lm function ignores that column.